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Extended  Abstract 


Abstract 

A  WebView  is  a  web  page  that  is  automatically  created  from  base  data,  which  are  usually 
drawn  from  a  DBMS.  A  WebView  can  be  either  materialized  as  an  html  page  at  the  web  server, 
or  virtual,  always  being  computed  on-the-fly.  For  the  materialized  case,  updates  to  base  data 
lead  to  immediate  recomputation  of  the  WebView,  whereas  in  the  virtual  case,  recomputation  is 
done  on  demand  with  each  request.  We  introduce  the  materialize  on-demand  approach  which 
combines  the  two  strategies,  and  generates  WebViews  on  demand,  but  also  stores  the  results 
and  re-uses  them  in  the  future  if  possible.  Deciding  on  one  of  the  three  materialization  policies 
for  each  WebView  is  clearly  a  performance  issue.  In  this  paper,  we  give  the  framework  for  the 
problem  and  provide  a  cost  model,  which  we  test  with  experiments  on  a  real  web  server. 
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1  Introduction 


The  web  is  increasingly  being  used  as  the  means  to  do  everyday  tasks,  from  reading  the  newspaper  to 
shopping  or  paying  bills.  One  common  denominator  for  all  these  activities  is  that  the  corresponding 
web  sites  provide  some  sort  of  personalization ,  tailored  to  the  style  and  needs  of  each  individual 
([B+98]).  Personalized  web  pages,  that  are  automatically  created  from  base  data,  are  one  of  the 
many  instances  of  Web  Views.  In  general,  we  define  WebViews  as  web  pages  that  are  automatically 
constructed  from  “base  data”  using  a  program  or  a  DBMS  query. 

Similarly  to  traditional  database  views,  WebViews  can  be  in  two  forms:  virtual  or  materialized. 
Virtual  WebViews  are  computed  dynamically  on-demand,  usually  by  a  CGI  script,  whereas  materi¬ 
alized  WebViews  are  pre-computed  and  stored  as  static  HTML  pages.  In  the  virtual  case,  the  cost 
to  compute  the  Web  View  increases  the  query  response  time1.  On  the  other  hand,  in  the  materialized 
case,  every  update  to  the  base  data  leads  to  a  Web  View  refresh,  which  increases  the  server  load. 

Having  a  Web  View  materialized  can  potentially  give  significantly  lower  query  response  times, 
provided  that  the  update  workload  is  not  heavy.  Even  if  the  Web  View  computation  is  not  very 
expensive,  by  keeping  it  materialized  we  eliminate  the  latency  of  going  to  the  DBMS  every  time, 
which  could  lead  to  DBMS  overloading  ([Sin98]).  However,  if  the  update  workload  is  heavy,  having 
the  Web  View  materialized  can  lead  to  a  degradation  in  performance,  as  every  update  will  cause  a 
refresh.  In  this  case,  deferring  the  updates  until  the  time  of  query  ([RK86])  is  the  best  solution. 
Clearly,  the  decision  whether  to  have  a  Web  View  materialized  or  virtual  at  the  server,  the  WebView 
materialization  problem,  is  a  performance  issue. 

WebView  materialization  is  different  from  traditional  web  caching :  WebView  materialization 
aims  at  eliminating  the  processing  time  needed  for  repeated  generation,  whereas  web  caching  strives 
to  eliminate  unnecessary  data  transmissions  across  the  network  ([Mal98]).  Also,  WebView  materi¬ 
alization  is  performed  at  the  web  server,  whereas  web  caching  is  done  at  the  clients  or  at  proxies. 
However,  although  different,  both  techniques  improve  web  server  performance. 

The  WebView  materialization  problem  is  similar  to  that  of  deciding  which  views  to  materialize 
in  a  data  warehouse  ([GM95,  Gup97,  Rou98]),  known  as  the  view  selection  problem.  There  are 
however  many  differences.  First  of  all,  although  both  problems  aim  at  decreasing  query  response 
times,  warehouse  views  are  materialized  in  order  to  speed  up  the  execution  of  a  few  &  long  analyti¬ 
cal  (OLAP)  queries,  whereas  WebViews  are  materialized  to  avoid  repeated  execution  of  many  small 
OLTP-style  queries.  Secondly,  since  WebViews  are  defined  after  user  requests,  unlike  warehouse 
views,  the  resulting  search  space  for  the  decision  problem  is  significantly  smaller.  Moreover,  we 
have  accurate  statistics  on  the  access  and  update  frequencies  for  all  the  WebViews  from  the  web 
server  logs.  Thirdly,  WebView  materialization  means  avoiding  an  extra  layer  of  software  (i.e.  gen- 

1  We  use  the  term  queries  to  refer  to  web  page  requests  for  a  particular  WebView. 
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erating  the  Web  View  by  executing  a  program  or  a  DBMS  query),  whereas  in  the  warehouse  case 
one  always  has  to  issue  a  query  to  the  DBMS.  Finally,  the  general  case  of  the  Web  View  material¬ 
ization  problem  has  no  constraints,  whereas  most  view  selection  algorithms  impose  some  resource 
constraints  (e.g.  space  requirement  or  maintenance  time  allowed  [KR99]). 

In  the  web  context,  although  there  is  a  lot  of  recent  literature  on  building  &  maintaining  web 
sites  ([AMM98,  FFK+98]),  on  querying  the  web  ([MMM97,  FLM98,  GW98])  and  on  integrating 
heterogeneous  data  sources  ([CDSS98,  MZ98]),  there  is  very  little  work  on  the  performance  issues 
associated  with  materializing  Web  Views.  [MMM98]  provide  an  algorithm  to  support  client-side 
materialization  of  Web  Views,  and  [Sin98],  [AMR+98]  present  algorithms  to  maintain  them  incre¬ 
mentally.  However,  to  the  best  of  our  knowledge,  this  is  the  first  attempt  to  provide  a  quantitative 
solution  to  the  problem  of  deciding  the  best  materialization  strategy  for  Web  Views. 

In  this  paper  we  give  the  framework  for  the  Web  View  materialization  problem,  and  also  propose 
a  hybrid  approach  that  combines  the  advantages  of  both  the  virtual  and  the  materialized  approaches 
(Section  2).  We  also  present  an  approximate  cost  model  for  the  Web  View  materialization  problem 
in  Section  3.  Finally,  we  present  the  results  of  our  experiments  on  a  real  web  server  in  Section  4  and 
our  conclusions  in  the  last  section. 


2  Web  View  Materialization  Problem 

When  a  Web  View  is  materialized,  any  update  to  the  base  data  leads  to  an  immediate  refresh  of  the 
derived  Web  View  (in  addition  to  the  update  to  the  underlying  DBMS).  The  refresh  can  either  be 
incremental  or  a  complete  recomputation2.  Requests  for  such  a  Web  View,  however,  are  very  fast, 
since  they  are  pre-computed.  On  the  other  hand,  virtual  Web  Views  are  always  generated  on-the-fly. 
This  means  that  updates  to  the  base  data  are  only  applied  to  the  DBMS,  but  queries  have  to  wait  for 
the  Web  View  to  be  recomputed  every  time.  Clearly,  both  of  these  approaches  can  cause  significant 
performance  degradation  if  not  used  properly  (e.g.  if  materializing  a  Web  View  that  has  a  lot  of 
updates  and  very  few  requests). 

There  is  another,  hybrid  alternative:  generate  the  Web  View  on  demand  (like  the  virtual  ap¬ 
proach),  but  also  store  the  results  and  re-use  them  in  the  future  (like  the  materialized  approach),  if 
possible.  We  call  this  approach  materialize  on-demand.  Under  this  strategy,  an  update  to  the  base 
data  must  invalidate  the  derived  Web  View  (but  it  will  not  cause  a  refresh).  When  the  server  gets  a 
request  for  the  Web  View,  it  first  checks  whether  it  has  been  invalidated  and,  if  not,  simply  returns 
the  saved  copy.  If  the  view  has  been  invalidated  since  the  last  time  it  was  saved,  then  the  server 

2Since  the  materialized  Web  View  is  in  html  format,  it  is  difficult  to  do  an  incremental  refresh,  although  not  impossi¬ 
ble.  For  the  remainder  of  the  paper,  we  assume  that  a  complete  recomputation  is  taking  place  after  every  update  to  the 
base  data. 
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needs  to  generate  the  Web  View  and  save  it  again. 

We  formulate  the  WebView  materialization  problem  as: 

For  every  WebView  at  the  server,  select  the  materialization  strategy  (virtual,  material¬ 
ized,  materialized  on-demand)  for  minimizing  the  average  query  response  time  on 
the  clients.  We  assume  that  there  is  no  storage  constraint  at  the  server. 

The  assumption  that  there  is  no  storage  constraint  on  the  server  is  not  unrealistic,  since,  in  our  case, 
storage  means  disk  space  (and  not  main  memory)  and  also  Web  Views  are  expected  to  be  relatively 
small3.  In  this  paper,  we  also  assume  a  no  staleness  requirement,  i.e.  the  Web  Views  must  always  be 
up  to  date.  This  is  a  reasonable  requirement,  since  users  would  rather  access  fresh  data. 

Clearly  the  WebView  materialization  decision  is  heavily  dependent  on  the  update  and  access 
patterns  for  the  Web  Views,  whereas  the  calculation  cost  and  the  size  could  also  play  some  role. 
There  are  some  classes  of  Web  Views  for  which  a  straightforward  solution  to  the  materialization 
problem  exists.  For  example,  Web  Views  with  a  lot  of  requests  which  do  not  get  a  lot  of  updates 
should  be  materialized,  since  keeping  them  up-to-date  “pays  off”  because  of  the  high  access  rate. 
An  example  for  this  scenario  are  the  web  pages  in  Yahoo  (http :  /  /www .  yahoo .  com)  which  don’t 
get  many  updates  and  are  thus  kept  as  HTML  pages.  On  the  other  hand,  Web  Views  with  a  lot  of 
updates  and  infrequent  access  should  be  virtual,  since  the  overhead  of  keeping  them  fresh  is  not 
warranted  by  the  number  of  requests.  An  example  for  this  case  is  a  personalized  stock  portfolio 
page  from  a  web  site  offering  real-time  stock  market  data.  Since  the  update  frequency  is  very  high 
(stock  prices  can  change  many  times  in  a  second),  the  corresponding  WebView  would  have  to  be 
virtual  and  generated  on-demand  using  CGI  scripts. 

Although  some  classes  of  Web  Views  have  straightforward  solutions  to  the  materialization  prob¬ 
lem,  this  is  not  the  case  in  general.  To  find  an  analytical  solution  to  the  WebView  materialization 
problem,  we  have  developed  a  cost  model,  which  we  present  in  the  next  section. 

3  Cost  model 

We  want  to  compare  the  three  different  materialization  policies  (materialized,  virtual,  materialized 
on-demand)  for  a  WebView  Vi  and  decide  which  one  will  lead  to  smaller  query  response  times  under 
given  workload  conditions.  We  calculate  the  cost  to  the  server  under  each  materialization  policy  for 
Vi,  however  we  make  the  distinction  between  update  cost,  Cu ,  which  is  load  on  the  server  because 
of  the  application  of  the  updates,  and  the  access  cost,  CA,  which  is  load  on  the  server  because  of  the 
user  requests  for  vl.  We  expected  and  verified  in  the  experiments  that  the  query  response  times  are 

3The  average  web  page  is  30KB  ([AW97]),  so  a  single  50GB  hard  disk  for  example  could  hold  approximately  1.5 
million  pages. 
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going  to  be  more  “sensitive”  to  the  access  cost,  since  it  is  in  the  “critical  path”  of  each  request.  Let 
fa{vi)  be  the  WebView  access  frequency  given  in  some  unit  of  time,  say  minutes,  and  /„(?y)  be  the 
WebView  update  frequency .  fa(vi)  includes  requests  to  WebView  v,  from  all  clients.  Finally,  let  the 
cost  to  recompute  Vi  be  cgen(vi). 

Materialized  Policy  If  Vi  is  kept  materialized,  then  the  cumulative  update  cost  is 

—  fu{fi)  X  (c9en('Uj)  +  Cw(Vif  (1) 

where  cw  (vf  is  the  cost  to  write  v,  to  disk4. 

The  cumulative  access  cost,  if  is  kept  materialized  is: 

Cmat(Vi)  =  fa(Vi )  X  Cr{Vi)  (2) 

where  cr(vi)  is  the  cost  to  read  vt  from  disk4. 

Since  we  wish  to  minimize  the  average  query  response  time,  we  must  give  more  weight  to  the 
access  cost  than  the  update  cost,  because  the  access  cost  has  a  direct  effect  on  the  response  time, 
whereas  the  update  cost  has  only  an  indirect  effect  (by  increasing  the  server  load).  This  asymmetry 
is  due  to  the  fact  that  a  request  for  a  web  page  can  be  serviced  while  an  update  on  the  same  page 
takes  place,  in  other  words  there  is  no  locking  or  blocking  on  typical  web  servers. 

Following  this  idea,  to  get  the  overall  cost  for  keeping  Vi  materialized,  we  introduce  a  weight 
factor  for  the  access  cost,  a  >  1,  which  is  expected  to  be  platform-dependent.  The  total  cost  is: 

Cmat{Vi)  =  Cmat(Vi)  +  OL  X  Cmat(Vi)  (3) 

Virtual  Policy  In  contrast  to  the  materialized  strategy,  if  Vi  is  kept  virtual,  there  will  be  no  update 
cost  whatsoever5.  Therefore 

=  o  <4> 

On  the  other  hand,  the  cumulative  access  cost  if  Vi  is  kept  virtual  is: 

^virt^Pi)  f aifi)  X  CgeniPi)  (5) 

where  cgen(vi)  is  the  cost  to  recompute  vt ,  and,  in  this  case,  it  is  “suffered”  by  every  query. 

4For  simplicity  one  could  assume  that  all  Web  Views  have  approximately  the  same  size,  kept  constant  in  the  presence 
of  updates,  and  hence  the  cost  to  read  or  write  a  WebView  to  disk  would  be  the  same  for  all. 

5  We  do  not  take  into  account  the  cost  to  update  the  base  data,  since  this  will  be  the  same  with  all  three  materialization 
policies. 
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Like  in  the  materialized  case,  we  have  to  use  a  when  calculating  the  total  cost  for  the  virtual 
policy: 

Cvirt(Vi)  =  Cvirt(Vi)  +  ®X  Cfirt(Vi)  (6) 

Materialized  On-Demand  Policy  If  is  kept  materialized  on-demand,  then  the  cumulative  up¬ 
date  cost  is  only  the  invalidation  cost: 

CmodiVi)  =  fuM  X  Cinv  (7) 


where  cinv  is  the  cost  to  invalidate  one  Web  View. 

The  access  cost  of  Vi  under  a  materialized  on-demand  policy,  has  a  lookup  cost  on  every  request 
when  the  server  checks  to  see  if  the  Web  View  has  been  invalidated.  Furthermore,  for  every  update 
we  also  have  to  include  the  Web  View  generation  cost  plus  a  small  cost  to  save  the  Web  View  to 
disk.  When  the  access  frequency  is  higher  than  the  update  frequency  for  ,  we  expect  to  have  better 
performance  compared  to  the  virtual  strategy,  since  in  that  case  we  only  pay  the  cost  of  reading  the 
saved  Web  View  from  disk  for  the  extra  accesses,  instead  of  recomputing  it  all  the  time.  Here  is  the 
upper  bound6  for  the  access  cost  to  v%  under  a  materialized  on-demand  policy: 


Cmod^V'i)  —  faiPi)  X  Cckk  T  fuipi)  X  ( Cgen(Vi )  "F  (l^))  ~\~  b  X  (/a('Wj)  fuiPi))  X  Cr(v j)  (8) 

where  cchk  is  the  cost  to  check  if  one  Web  View  has  been  invalidated,  and  b  is  1  if  ./„(?’,)  >  /„(?;/), 
and  0  otherwise. 

Finally,  the  total  cost  for  the  materialized  on-demand  policy  is: 

Cmod(vi)  =  C^od(vi)  +  ax  C^od(vi)  (9) 

Web  View  Materialization  Problem  Let  V  be  the  set  of  Web  Views  in  our  system  and  let  Vmat  be 
the  subset  of  Web  Views  that  are  materialized,  Vviri  the  ones  that  are  virtual,  and  Vmod  the  ones  that 
are  materialized  on-demand.  The  total  cost  would  be: 

Ctotal  =  X]  Cmat{Vi)  +  Cvirt{^j)  +  X]  Cmod(v  k)  (10) 

Vi(zVrnat  Vj£.Vvirt  ^k^-^mod 

With  the  help  of  Equations  1  -  10  we  can  rephrase  the  Web  View  materialization  problem  as: 
partition  Vinto  Vmat,  Vviri,  Vmo([,  such  that  C total  is  minimized. 

By  default,  web  servers  log  all  page  requests,  so,  estimating  fa(vi )  for  any  Web  View  v,  is  not 
difficult.  Calculating  fu(vj)  and  the  rest  of  the  costs  is  easy  too,  and  we  can  assume  that  cr(vi), 

6We  cannot  calculate  the  exact  cost,  as  it  depends  on  the  interleaving  of  updates  and  accesses  to  Vj . 
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Cy,(vi)  will  be  approximately  the  same  for  all  Web  Views  to  make  things  even  easier. 

To  get  some  intuition  behind  the  formulas,  we  assume  that  all  costs,  except  for  cgen(vi )  and 
cgm(vi),  are  small  and  constant.  Under  this  assumption,  we  can  get  some  approximations  for 

CmatiPi)t  CvirtiVi),  CmodiVi)- 

Cmat(Vi)  =  fu(vi )  x  cgen(vi ) 

C'virt{Vi)  =  OL  X  fa(Vi)  X  Cgen(Vi)  (11) 

^modiVi)  Q;  X  /TO(tj)  X  Cgen(tj)  T  Cover 

where  fm{vi)  is  mi n(/0 fu(vi)),  and  cover,  is  a  composite  cost  to  reflect  the  extra  disk  I/O  that 
the  materialize  on-demand  approach  has  to  make.  We  expect  that  the  materialize  on-demand  ap¬ 
proach  will  give  better  performance  than  the  classic  virtual  strategy,  in  cases  where  fa(vi)  >  fu (v, ) . 
Furthermore,  we  can  see  that  in  order  to  choose  between  the  materialize  and  materialize  on-demand 
policies,  we  should  consult  the  weighted  ratio  of  accesses  to  updates,  A  =  ^ .  A  ratio  A  >  1 

would  suggest  that  the  materialized  approach  is  better,  otherwise  a  materialized  on-demand  or  a 
virtual  strategy  would  be  expected  to  yield  better  performance. 

4  Experiments 

For  our  experiments  we  used  two  machines,  a  SUN  UltraSparc-5  with  320MB  of  memory,  running 
Solaris  2.6  and  an  AlphaStation  255  with  64MB  of  memory,  running  Digital  Unix  V4.0.  The  web 
server,  Apache  version  1.3.6  (http :  //www.  apache .  org) ,  ran  concurrently  with  the  update  pro¬ 
cess  on  the  SUN  machine,  while  the  clients  were  running  on  the  Alpha.  All  machines  were  on  the 
same  local  area  network  in  order  to  eliminate  (uncontrollable)  network  latency  from  our  experi¬ 
ments.  In  every  experiment,  each  client  would  read  a  set  of  queries  from  a  script,  send  the  requests 
to  the  web  server  and  wait  for  the  reply,  measuring  the  elapsed  time  for  each  query  (averaged  over 
multiple  runs). 

Workload  Our  workload  consisted  of  100  Web  Views.  Their  access  rates  followed  the  Zipf  distri¬ 
bution  with  a  theta  of  0.7,  as  suggested  in  [BCF+99].  The  total  accesses  to  the  web  server  averaged 
to  about  12  requests  per  second.  This  should  correspond  to  a  quite  heavy  load  on  the  server,  of  about 
1  million  hits  per  day.  For  comparison,  our  departmental  web  server  (http :  /  /www .  cs  .  umd .  edu ) 
gets  about  70,000  requests  a  day  which  correspond  to  only  about  0.8  requests  per  second. 

While  the  access  rate  for  each  Web  View  was  kept  the  same  for  all  experiments,  we  varied  the 
update  rate  and  the  materialization  policy  for  10  out  of  the  100  Web  Views,  our  test  group.  The 
remaining  90  Web  Views  had  no  updates  at  all,  were  always  materialized  and  played  the  role  of  a 
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“background”  load  to  the  server.  The  sizes  for  the  Web  Views  were  on  average  30KB  ([AW97])  and 
the  calculation  cost  was  rather  small,  0.5  seconds. 

Experiments  The  test  group  in  our  experiments  consisted  of  10  Web  Views  in  the  “middle”  of  the 
access  rate  distribution  (with  access  rates  about  3-4  requests  per  minute).  We  varied  the  update  rate 
from  0  up  to  30  updates  per  minute  for  each  of  the  10  Web  Views.  For  each  update  rate,  we  ran  three 
different  experiments,  one  for  each  materialization  policy. 


Figure  1 :  Total  avg  query  response  time 

We  plot  the  average  query  response  time  for  all  queries  (including  the  ones  in  the  test  group)  in 
Fig.  1.  The  x-axis  is  the  update  rate  for  the  test  group,  in  updates/min,  whereas  the  y-axis  is  the  total 
average  query  response  time,  in  seconds.  The  mat  line  corresponds  to  the  case  where  all  WebViews 
from  the  test  group  are  kept  materialized  (i.e.  the  updates  cause  all  WebViews  to  be  refreshed  in  the 
background),  the  virt  line  to  the  virtual  case  (i.e.  the  query  result  is  recomputed  on  every  request) 
and  the  mod  line  corresponds  to  the  case  where  all  WebViews  in  the  test  group  are  materialized 
on-demand  (i.e.  the  query  result  is  recomputed  on  request,  but  also  saved  for  future  use  and  updates 
invalidate  the  saved  copy). 

We  see  from  Fig.  1  that  in  the  virtual  case,  the  overall  performance  is  not  affected  by  the  update 
rate  (i.e.  the  virt  line  is  almost  straight),  as  it  was  expected.  On  the  other  hand,  the  materialize  on- 
demand  policy,  depending  on  the  interleaving  of  updates  and  requests,  can  have  better  performance 
over  the  virtual  approach,  since  it  re-uses  pre-computed  results  as  much  as  possible,  whereas  the 
virtual  approach  blindly  recomputes  each  Web  View  on  every  request.  Furthermore,  if  we  look  at 
the  average  query  response  time  for  only  the  WebViews  in  the  test  group  (Fig  2),  we  verify  that  the 
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Figure  2:  Test  group  avg  query  response  time 


materialize  on-demand  policy  outperforms  the  virtual  strategy  when  the  update  rate  is  less  than  the 
access  rate  (first  two  points  in  the  graph). 


Figure  3:  (All— Test  group)  avg  query  response  time 

From  Figures  1  and  2  we  see  that  the  materialized  policy  performs  really  well,  even  for  update 
rates  far  exceeding  the  access  rate,  although,  eventually,  the  virtual  &  the  materialized  on-demand 
strategies  are  expected  to  perform  better  for  very  high  update  rates.  So  where  do  the  savings  for  the 
materialized  strategy  come  from?  We  plot  in  Fig.  3,  the  average  query  response  time  of  all  queries 
except  for  those  in  the  test  group.  From  Fig.  3,  we  see  that  the  reason  for  this  great  performance 
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is  that  the  materialized  policy  “penalizes”  the  rest  of  the  views,  by  slightly  increasing  their  query 
response  times  (since  the  updates  done  at  the  background  increase  the  load  at  the  server).  On  the 
other  hand,  the  performance  of  the  rest  of  the  views  is  almost  not  affected  with  the  materialized 
on-demand  and  virtual  approaches,  since  the  cost  of  the  updates  is  inflicted  on  the  query  response 
time  of  the  updated  Web  View. 

5  Conclusions 

In  this  paper,  we  have  introduced  the  materialize  on-demand  policy  for  Web  Views,  that  combines 
the  materialized  and  virtual  strategies.  We  also  formulated  the  Web  View  materialization  problem, 
and  described  a  cost  model  to  help  decide  among  the  three  materialization  strategies  (materialized, 
virtual,  materialized  on-demand).  Our  experiments  showed  that  the  materialized  policy  usually 
leads  to  better  performance,  at  the  expense,  however,  of  the  other  Web  Views.  On  the  other  hand,  if 
the  update  rate  is  really  high  compared  to  the  access  rate,  the  virtual  and  materialized  on-demand 
strategies  have  better  overall  performance  than  the  materialized  policy,  since  they  defer  the  updates 
till  the  time  of  the  query.  Finally,  the  materialized  on-demand  strategy  outperforms  the  virtual  policy 
when  the  access  rate  is  higher  than  the  update  rate,  since  it  avoids  recomputation  of  the  Web  View 
when  there  are  no  updates. 

We  are  currently  implementing  the  materialized  on-demand  policy  for  the  web  server  of  the 
AMASE  project  (http :  //amase .  gsf  c .  nasa .  gov) .  As  part  of  our  future  work,  we  want  to  drive 
our  experiments  with  trace  data  from  a  commercial  web  server. 
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